Create bar or line charts showing monthly total kBTUs of residential and commercial electricity and gas consumption for the Bay Area (meaning the sum of all ZIP codes in the 9 Bay Area counties) from 2017 to the latest available month (meaning a 42-column version of the plots from section 1.8). Look online for the correct conversion of kWhs to kBTUs and therms to kBTUs. Make sure that electricity and gas data are distinguishable but plotted in the same chart; feel free to separate your analyses for residential and commercial into two separate charts if you believe it improves legibility, or do one chart with 4 colors in the legend. Comment on any observable changes in energy consumption that may be attributable to the COVID-19 pandemic. Create at least one map that highlights which neighborhoods experienced the greatest change in electricity consumption before and after the pandemic began, and comment on your results. Explain any key assumptions you made in the analysis, or caveats about the data sources that you think the reader should be aware of. Publish all of this work in a web report.
The following libraries and packages are used in this analysis:
library(tidyverse)
library(ggplot2)
library(leaflet)
library(tigris)
library(sf)
library(leaflet)
library(plotly)
library(zoo)
library(knitr)
The following code imports electricity usage datasets from Pacific Gas & Electric (PG&E) between the years 2017 and 2020, located at: https://pge-energydatarequest.com/public_datasets
years <- 2017:2020
quarters <- 1:4
types <- "Electric"
pge_elec <- NULL
for(quarter in quarters){
for(year in years){
for(type in types){
filename <- paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
if(year == 2020 && quarter %in% 3:4){
next
}
print(filename)
temp <- read_csv(filename)
}
pge_elec <- rbind(pge_elec, temp)
saveRDS(pge_elec, "pge_elec.rds")
}
}
## [1] "PGE_2017_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2020_Q1_ElectricUsageByZip.csv"
## [1] "PGE_2017_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2020_Q2_ElectricUsageByZip.csv"
## [1] "PGE_2017_Q3_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q3_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q3_ElectricUsageByZip.csv"
## [1] "PGE_2017_Q4_ElectricUsageByZip.csv"
## [1] "PGE_2018_Q4_ElectricUsageByZip.csv"
## [1] "PGE_2019_Q4_ElectricUsageByZip.csv"
#Conversion factors to convert KWH to kBTU
pge_elec$TOTALKBTU <- pge_elec$TOTALKWH*3.412
pge_elec$AVERAGEKBTU <- pge_elec$AVERAGEKWH*3.412
pge_elec$TOTALKWH <- NULL
pge_elec$AVERAGEKWH <- NULL
pge_elec
## # A tibble: 128,319 x 8
## ZIPCODE MONTH YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 93117 1 2017 Elec- Agricu~ Y 0 0
## 2 93117 2 2017 Elec- Agricu~ Y 0 0
## 3 93117 3 2017 Elec- Agricu~ Y 0 0
## 4 93203 1 2017 Elec- Agricu~ Y 0 0
## 5 93203 2 2017 Elec- Agricu~ Y 0 0
## 6 93203 3 2017 Elec- Agricu~ Y 158 17200612.
## 7 93204 1 2017 Elec- Agricu~ Y 0 0
## 8 93204 2 2017 Elec- Agricu~ Y 0 0
## 9 93204 3 2017 Elec- Agricu~ Y 0 0
## 10 93206 1 2017 Elec- Agricu~ Y 302 30058041.
## # ... with 128,309 more rows, and 1 more variable: AVERAGEKBTU <dbl>
Now, I import gas usage data from Pacific Gas & Electric (PG&E) between the years 2017 and 2020, located at: https://pge-energydatarequest.com/public_datasets
years <- 2017:2020
quarters <- 1:4
types <- "Gas"
pge_gas <- NULL
for(quarter in quarters){
for(year in years){
for(type in types){
filename <- paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
if(year == 2020 && quarter %in% 3:4){
next
}
print(filename)
temp <- read_csv(filename)
}
pge_gas <- rbind(pge_gas, temp)
saveRDS(pge_gas, "pge_gas.rds")
}
}
## [1] "PGE_2017_Q1_GasUsageByZip.csv"
## [1] "PGE_2018_Q1_GasUsageByZip.csv"
## [1] "PGE_2019_Q1_GasUsageByZip.csv"
## [1] "PGE_2020_Q1_GasUsageByZip.csv"
## [1] "PGE_2017_Q2_GasUsageByZip.csv"
## [1] "PGE_2018_Q2_GasUsageByZip.csv"
## [1] "PGE_2019_Q2_GasUsageByZip.csv"
## [1] "PGE_2020_Q2_GasUsageByZip.csv"
## [1] "PGE_2017_Q3_GasUsageByZip.csv"
## [1] "PGE_2018_Q3_GasUsageByZip.csv"
## [1] "PGE_2019_Q3_GasUsageByZip.csv"
## [1] "PGE_2017_Q4_GasUsageByZip.csv"
## [1] "PGE_2018_Q4_GasUsageByZip.csv"
## [1] "PGE_2019_Q4_GasUsageByZip.csv"
pge_gas$TOTALKBTU <- pge_gas$TOTALTHM*100
pge_gas$AVERAGEKBTU <- pge_gas$AVERAGETHM*100
pge_gas$TOTALTHM <- NULL
pge_gas$AVERAGETHM <- NULL
pge_gas
## # A tibble: 57,038 x 8
## ZIPCODE MONTH YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 92304 1 2017 Gas- Commerc~ Y 0 0
## 2 92304 2 2017 Gas- Commerc~ Y 0 0
## 3 92304 3 2017 Gas- Commerc~ Y 0 0
## 4 92365 1 2017 Gas- Commerc~ Y 0 0
## 5 92365 2 2017 Gas- Commerc~ Y 0 0
## 6 92365 3 2017 Gas- Commerc~ Y 0 0
## 7 93203 1 2017 Gas- Commerc~ Y 0 0
## 8 93203 2 2017 Gas- Commerc~ Y 0 0
## 9 93203 3 2017 Gas- Commerc~ Y 0 0
## 10 93204 1 2017 Gas- Commerc~ Y 0 0
## # ... with 57,028 more rows, and 1 more variable: AVERAGEKBTU <dbl>
Binding the two datasets together:
pge_elec_gas <- rbind(pge_gas, pge_elec)
pge_elec_gas
## # A tibble: 185,357 x 8
## ZIPCODE MONTH YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 92304 1 2017 Gas- Commerc~ Y 0 0
## 2 92304 2 2017 Gas- Commerc~ Y 0 0
## 3 92304 3 2017 Gas- Commerc~ Y 0 0
## 4 92365 1 2017 Gas- Commerc~ Y 0 0
## 5 92365 2 2017 Gas- Commerc~ Y 0 0
## 6 92365 3 2017 Gas- Commerc~ Y 0 0
## 7 93203 1 2017 Gas- Commerc~ Y 0 0
## 8 93203 2 2017 Gas- Commerc~ Y 0 0
## 9 93203 3 2017 Gas- Commerc~ Y 0 0
## 10 93204 1 2017 Gas- Commerc~ Y 0 0
## # ... with 185,347 more rows, and 1 more variable: AVERAGEKBTU <dbl>
The following code sets up the Bay Area counties, cities, zip codes, and associated geometry:
bay_county_names <-
c(
"Alameda",
"Contra Costa",
"Marin",
"Napa",
"San Francisco",
"San Mateo",
"Santa Clara",
"Solano",
"Sonoma"
)
bay_counties <-
counties("CA", cb = T, progress_bar = F) %>%
filter(NAME %in% bay_county_names)
usa_zips <-
zctas(cb = T, progress_bar = F)
bay_zips <-
usa_zips %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(usa_zips %>% select(GEOID10)) %>%
st_as_sf()
ca_cities <- places("CA", cb = T, progress_bar = FALSE)
bay_cities <- ca_cities[bay_counties, ]
bay_cities_within <-
ca_cities %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(ca_cities %>% select(GEOID)) %>%
st_as_sf()
I now filter the PGE_Elec_Gas combined dataset only to Residential and Commercial gas and electricity usage corresponding to each zip code, month and year.
pge_final <-
pge_elec_gas %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
)%>%
filter(
CUSTOMERCLASS %in%
c(
"Elec- Residential",
"Elec- Commercial",
"Gas- Residential",
"Gas- Commercial"
)
) %>%
select(
!c(COMBINED, AVERAGEKBTU)
) %>%
group_by(ZIPCODE, MONTH, YEAR, CUSTOMERCLASS) %>%
summarize(
TOTALKBTU =
sum(
TOTALKBTU,
na.rm = T
),
TOTALCUSTOMERS =
sum(
TOTALCUSTOMERS,
na.rm = T
)
) %>%
mutate(
AVERAGEKBTU =
TOTALKBTU/TOTALCUSTOMERS
) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
pge_final$DATE <- as.yearmon(paste(pge_final$YEAR, pge_final$MONTH), "%Y %m")
pge_final
## Simple feature collection with 44668 features and 8 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -123.5335 ymin: 36.89303 xmax: -121.3083 ymax: 38.93713
## geographic CRS: WGS 84
## # A tibble: 44,668 x 9
## # Groups: ZIPCODE, MONTH, YEAR [12,189]
## ZIPCODE MONTH YEAR CUSTOMERCLASS TOTALKBTU TOTALCUSTOMERS AVERAGEKBTU
## * <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 94002 1 2017 Elec- Commer~ 14147749. 701 20182.
## 2 94002 1 2017 Elec- Reside~ 18706205. 10553 1773.
## 3 94002 1 2017 Gas- Commerc~ 0 0 NaN
## 4 94002 1 2017 Gas- Residen~ 83483700 9373 8907.
## 5 94002 1 2018 Elec- Commer~ 13867518. 706 19642.
## 6 94002 1 2018 Elec- Reside~ 17254730. 10599 1628.
## 7 94002 1 2018 Gas- Commerc~ 12584900 446 28217.
## 8 94002 1 2018 Gas- Residen~ 64025800 9406 6807.
## 9 94002 1 2019 Elec- Commer~ 14407535. 717 20094.
## 10 94002 1 2019 Elec- Reside~ 17518716. 10628 1648.
## # ... with 44,658 more rows, and 2 more variables: geometry <MULTIPOLYGON [°]>,
## # DATE <yearmon>
The total energy consumption in kBTU is ordered by month and categorized by energy type and location in the plot below:
pge_chart <-
pge_final %>%
ggplot() +
geom_bar(
aes(
x = DATE %>% factor(),
y = TOTALKBTU,
fill = CUSTOMERCLASS
),
stat = "identity",
position = "stack"
) +
labs(
x = "Month",
y = "kBTU",
title = "Bay Area Monthly Energy Usage, PG&E 2017-2020",
fill = "Energy Type"
)
pge_final_chart <- pge_chart %>%
ggplotly() %>%
layout(
xaxis = list(fixedrange = T, tickangle = -90),
yaxis = list(fixedrange = T)
) %>%
config(displayModeBar = F)
pge_final_chart
Next, I take the PGE_Final dataset from above. To determine the effect of COVID-19 on residential energy consumption, I assume energy consumption from March to June 2019 as the baseline. The following process displays the baseline (2019) consumption and the adjusted (2020) consumption and computes the difference and percent change.
pge_bay_change <-
pge_final %>%
filter(YEAR %in% 2019:2020, MONTH %in% (3:6), CUSTOMERCLASS == "Elec- Residential") %>%
st_set_geometry(NULL) %>%
select(!c(MONTH, TOTALCUSTOMERS, DATE, AVERAGEKBTU)) %>%
pivot_wider(
names_from = YEAR,
values_from = TOTALKBTU
) %>%
rename(
kBTU_2019 = "2019",
kBTU_2020 = "2020"
) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
pge_bay_change$kBTU_Change <- pge_bay_change$kBTU_2020 - pge_bay_change$kBTU_2019
pge_bay_change$Percent_Change <- pge_bay_change$kBTU_Change/pge_bay_change$kBTU_2019 * 100
pge_bay_change
## Simple feature collection with 1147 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -123.5335 ymin: 36.89303 xmax: -121.3083 ymax: 38.93713
## geographic CRS: WGS 84
## # A tibble: 1,147 x 8
## # Groups: ZIPCODE, MONTH [1,147]
## MONTH ZIPCODE CUSTOMERCLASS kBTU_2019 kBTU_2020 geometry
## * <dbl> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 3 94002 Elec- Reside~ 15670057. 16308012. (((-122.3359 37.50805, -~
## 2 4 94002 Elec- Reside~ 13092936. 14838170. (((-122.3359 37.50805, -~
## 3 5 94002 Elec- Reside~ 13256889. 14308799. (((-122.3359 37.50805, -~
## 4 6 94002 Elec- Reside~ 12921855. 13501779. (((-122.3359 37.50805, -~
## 5 3 94005 Elec- Reside~ 2669122. 2777757. (((-122.442 37.69435, -1~
## 6 4 94005 Elec- Reside~ 2195329. 2558167. (((-122.442 37.69435, -1~
## 7 5 94005 Elec- Reside~ 2207762. 2394262. (((-122.442 37.69435, -1~
## 8 6 94005 Elec- Reside~ 2089263. 2255366. (((-122.442 37.69435, -1~
## 9 3 94010 Elec- Reside~ 34121552. 35412868. (((-122.4126 37.58916, -~
## 10 4 94010 Elec- Reside~ 28494011. 31997446. (((-122.4126 37.58916, -~
## # ... with 1,137 more rows, and 2 more variables: kBTU_Change <dbl>,
## # Percent_Change <dbl>
summary(pge_bay_change)
## MONTH ZIPCODE CUSTOMERCLASS kBTU_2019
## Min. :3.0 Length:1147 Length:1147 Min. : 0
## 1st Qu.:4.0 Class :character Class :character 1st Qu.: 3767479
## Median :4.5 Mode :character Mode :character Median :12223084
## Mean :4.5 Mean :12716828
## 3rd Qu.:5.0 3rd Qu.:19206266
## Max. :6.0 Max. :61917779
## NA's :13 NA's :20
## kBTU_2020 geometry kBTU_Change Percent_Change
## Min. : 0 MULTIPOLYGON :1147 Min. :-46740647 Min. :-83.957
## 1st Qu.: 3862411 epsg:4326 : 0 1st Qu.: 125118 1st Qu.: 3.796
## Median :13508784 +proj=long...: 0 Median : 756224 Median : 7.503
## Mean :13784902 Mean : 1119981 Mean : 8.783
## 3rd Qu.:20678221 3rd Qu.: 1684463 3rd Qu.: 12.753
## Max. :68625948 Max. : 16595473 Max. : 67.176
## NA's :16 NA's :23 NA's :79
Finally, the percent differences are visually represented through the following mapping analysis:
res_pal <- colorNumeric(
palette = "plasma",
domain =
pge_bay_change$Percent_Change
)
leaflet() %>%
addTiles() %>%
addPolygons(
data = pge_bay_change,
fillColor = ~res_pal(Percent_Change),
color = "black",
opacity = 0.5,
fillOpacity = 0.5,
weight = 1,
label = ~paste0(
round(Percent_Change),
"% Electricity Usage Change in ",
ZIPCODE
),
highlightOptions = highlightOptions(
weight = 2,
opacity = 1
)
) %>%
addLegend(
data = pge_bay_change,
pal = res_pal,
values = ~Percent_Change,
title = "Percent Change in Electricity Usage, 2019 vs. 2020"
)
The first part of this assignment determines the annual cycle of Pacific Gas and Electric (PG&E) gas and electricity usage between January 2017 and June 2020 for the 9-county San Francisco Bay Area. A visual representation of the data can be found in the resulting plot above. In summary, annual energy usage peaks in the winter due to the need to heat homes and power holiday decorations. Energy usage dips in the summer despite the warmth of the season, likely because of the relatively cooler summers experienced in the Bay Area. By observation, it appears that the COVID-19 pandemic resulted in greater energy consumption than usual.
This point is numericalized in the second part of the assignment, where I compare electricity usage in the Bay Area between 2019 and 2020, specifically in the months of March and June of both years. The goal is to determine the magnitude of the increase in electricity use as a result of the shelter-in-place order issued during the COVID-19 pandemic. Here, I assumed the four months of March 2019-June 2019 as the baseline, or the energy usage in a “normal” year. I also assumed that the number of PG&E customers remained constant (or negligibly increased/decreased) during the between 2019 and 2020. I contrast these amounts with those from March 2020-June 2020.
The analysis reveals a median increase in residential electricity use of 7.5% between 2019 and 2020 in the Bay Area. One point of interest is that two of the largest increases are located in Downtown Oakland (+32%) and Downtown San Jose (+20%). This could be explained by the proportionally higher amount of high-density/transit-oriented developments in these downtown areas. With more transit-dependent residents (seniors, for example) required to stay home during the pandemic, this likely caused the larger percentage increase in electricity consumption.